Efficient Reductions for Imitation Learning Supplementary Material

نویسندگان

  • Stéphane Ross
  • J. Andrew Bagnell
چکیده

Let i = Es∼di π∗ [eπ̂(s)] for i = 1, 2, . . . , T the expected 0-1 loss at time i of π̂, such that = 1 T ∑T i=1 i. Note that t corresponds to the probability that π̂ makes a mistake under distribution dπ∗ . Let pt represent the probability π̂ hasn’t made a mistake (w.r.t. π∗) in the first tstep, and dt the distribution of state π̂ is in at time t conditioned on the fact it hasn’t made a mistake so far. If dt represents the distribution of states at time t obtained by following π∗ but conditioned on the fact that π̂ made at least one mistake in the first t − 1 visited states. Then dπ∗ = pt−1dt + (1 − pt−1)dt. Now at time t, the expected cost of π̂ is at most 1 if it has made a mistake so far, or Es∼dt(Cπ̂(s)) if it hasn’t make a mistake yet. So J(π̂) ≤ ∑T t=1[pt−1Es∼dt(Cπ̂(s))+(1−pt−1)]. Let et and et represent the probability of mistake of π̂ in distribution dt and dt. Then Es∼dt(Cπ̂(s)) ≤ Es∼dt(Cπ∗(s))+et, and since t = pt−1et + (1− pt−1)et, then pt−1et ≤ t. Additionnally since pt = (1 − et)pt−1, pt ≥ pt−1 − t ≥ 1 − ∑t i=1 i, i.e. 1− pt ≤ ∑t i=1 i. Finally note that J(π ∗) = ∑T t=1[pt−1Es∼dt(Cπ∗(s))+(1−pt−1)Es∼d′t(Cπ∗(s))], so that ∑T t=1 pt−1Es∼dt(Cπ∗(s)) ≤ J(π∗). Using these facts we obtain: J(π̂)≤ ∑T t=1[pt−1Es∼dt(Cπ̂(s)) + (1− pt−1)] ≤ ∑T t=1[pt−1Es∼dt(Cπ∗(s)) + pt−1et + (1− pt−1)] ≤ J(π∗) + ∑T t=1 ∑t i=1 i ≤ J(π∗) + T ∑T t=1 t = J(π∗) + T 2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active lmitation learning: formal and practical reductions to I.I.D. learning

In standard passive imitation learning, the goal is to learn a policy that performs as well as a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the exper...

متن کامل

Efficient Reductions for Imitation Learning

Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a r...

متن کامل

Active Imitation Learning via Reduction to I.I.D. Active Learning

In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at indi...

متن کامل

Brain and psychological mediators of imitation: sociocultural versus physical traits

The acquisition of cultural beliefs and practices is fundamental to human societies. The psychological and neural mechanisms underlying cultural acquisition, however, are not well understood. Here we used brain imaging to investigate how others’ physical and sociocultural attributes may influence imitative learning, a critical component of cultural acquisition. While undergoing fMRI, 17 Europea...

متن کامل

A Probabilistic Framework for Model-Based Imitation Learning

Humans and animals use imitation as a mechanism for acquiring knowledge. Recently, several algorithms and models have been proposed for imitation learning in robots and humans. However, few proposals offer a framework for imitation learning in a stochastic environment where the imitator must learn and act under realtime performance constraints. We present a probabilistic framework for imitation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010